Building Domain-Speci c Search Engines with Machine Learning Techniques

نویسندگان

  • Andrew McCallum
  • Kamal Nigam
  • Jason Rennie
  • Kristie Seymore
چکیده

Domain-speci c search engines are growing in popularity because they o er increased accuracy and extra functionality not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by age-group, size, location and cost over summer camps. Unfortunately these domain-speci c search engines are di cult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-speci c search engines. We describe new research in reinforcement learning, information extraction and text classi cation that enables e cient spidering, identifying informative text segments, and populating topic hierarchies. Using these techniques, we have built a demonstration system: a search engine for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Machine Learning Approach to Building Domain-Speci c Search Engines

Domain-speci c search engines are becoming increasingly popular because they o er increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also di cult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-speci c search engines. We describe new...

متن کامل

A Machine Learning Approach to Building Domain-Specific Search Engines

Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describ...

متن کامل

Building Domain-Specific Search Engines with Machine Learning Techniques

Domain-specific search engines are growing in popularity because they offer increased accuracy and extra functionality not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by age-group, size, location and cost over .summer camps. Unfortunately these domain-specific search engines are difficult and timeconsuming to maintain. This paper pr...

متن کامل

Keyword Spices: A New Method for Building Domain-Specific Web Search Engines

This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on human knowledge about the domain in question. Accordingly, they are hard to build and can not be applied to other domains. The keyword spice method, in contrast, improves search performance by adding domain-specific k...

متن کامل

Searching the Web: learning based techniques

Searching and retrieving information from the Web poses new issues that can be e ectively tackled by applying machine learning techniques. In particular, the fast dynamics of the information available on the Internet requires new approaches for indexing; the huge amount of available data is hardly manageable by humans and, on the other hand, can provide large sets of examples for learning algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999